In Search of a Perceptual Metric for Timbre: Dissimilarity Judgments among Synthetic Sounds with MFCC-Derived Spectral Envelopes

نویسندگان

  • HIROKO TERASAWA
  • JONATHAN BERGER
  • SHOJI MAKINO
چکیده

This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, that is, the perception specifically related to the spectral element of timbre. Mel-cepstrum (Mel-frequency cepstral coefficients or MFCCs) is chosen as a hypothetical metric for spectral envelope perception due to its desirable properties of linearity, orthogonality, and multidimensionality. The experimental results confirmed the relevance of Mel-cepstrum to the perceived timbre dissimilarity when the spectral envelopes of complex-tone synthetic sounds were systematically controlled. The first experiment measured the perceived dissimilarity when the stimuli were synthesized by varying only a single coefficient from MFCC. Linear regression analysis proved that each of the 12 MFCCs has a linear correlation with spectral envelope perception. The second experiment measured the perceived dissimilarity when the stimuli were synthesized by varying two of the MFCCs. Multiple regression analysis showed that the perceived dissimilarity can be explained in terms of the Euclidean distance of the MFCC values of the synthetic sounds. The quantitative and perceptual relevance between the MFCCs and spectral centroids is also discussed. These results suggest that MFCCs can be a metric representation of spectral envelope perception, where each of its orthogonal basis functions provides a linear match with human perception.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multidimensional scaling of synthetic musical timbre: perception of spectral and temporal characteristics.

The perceptual correlates of acoustic parameters involved in musical timbre were investigated by examining judgements of timbre dissimilarity. Nine synthetic sounds were created, derived from crossing three levels of spectral and temporal parameters (number of harmonics and rise time, respectively). Two separate conditions were tested, one using single tones, the other using short melodies. Fif...

متن کامل

A timbre space for speech

We describe a perceptual space for timbre, define an objective metric that takes into account perceptual orthogonality and measure the quality of timbre interpolation. We discuss two timbre representations and measure perceptual judgments. We determine that a timbre space based on Mel-frequency cepstral coefficients (MFCC) is a good model for perceptual timbre space.

متن کامل

Determining the Euclidean Distance Between Two Steady State Sounds

We describe a perceptual space for timbre, define an objective metric that takes into account perceptual orthogonality and measure the quality of timbre interpolation. We discuss two timbre representations and using these two representations, measure perceived relationships between pairs of sounds on a equivalent range of timbre variety. We determine that a timbre space based on Mel-frequency c...

متن کامل

A statistical model of timbre perception

We describe a perceptual space for timbre, define an objective metric that takes into account perceptual orthogonality and measure the quality of timbre interpolation. We discuss two timbre representations and measure perceptual judgments on an equivalent range of timbre variety. We determine that a timbre space based on Mel-frequency cepstral coefficients (MFCC) is a good model for a perceptua...

متن کامل

Perceptual categorization of sound spectral envelopes reflected in auditory-evoked N1m.

Magnetic responses to periodic complex sounds with equivalent acoustic parameters except for two different fundamental frequencies (F0) and 12 different spectral envelopes of vocal, instrumental, and linear shapes were recorded to determine the cortical representation of timbre categorization in humans. Responses at approximately 100 ms (N1m) to vocal and instrumental (nonlinear) sounds were lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012